Feature extraction for unit selection in concatenative speech synthesis: comparison between AIM, LPC, and MFCC
نویسندگان
چکیده
A comprehensive computational model of the human auditory peripherals (AIM) was applied to extract basic features of speech sounds aiming at optimal unit selection in concatenative speech synthesis. The performance of AIM was compared to that of a purely physical model (LPC) as well as that of an approximate auditory model (MFCC) by basic perceptual experiments. While a significant advantage of AIM over LPC was observed, the performance based on AIM selection and MFCC selection did not differ significantly. However, a phoneme space based on the AIM features did not completely match one based on the MFCC features, demonstrating that the selection was not perfect yet. A detailed investigation conducted on the case of poor concatenation indicates that acoustic discontinuity at comparatively steady phonemic boundaries, especially those between vowel-like sounds, spoils perceptual impression. Sensitivity to such discontinuity will be required in order to further improve acoustic measures for unit selection.
منابع مشابه
Objective distance measures for assessing concatenative speech synthesis
Several di erent acoustic transforms of the speech signal are compared for use in the assessment and evaluation of concatenative speech synthesis. The transforms tested include LPC, LSP, MFCC, bispectrum, Mellin transform of the log spectrum, WignerVille distribution (WVD), etc. The computed distances between a synthesised utterance and a naturally spoken version of the same sentence are compar...
متن کاملDiscontinuity Removal in Concatenative Synthesized Speech
Concatenative synthesis concatenates segments of prerecorded natural human speech. It requires database of previously recorded human speech covering all the possible segments to be synthesised. Segment might be phoneme, syllable, word, phrase, or any combination. Concatenative speech synthesis is currently the most practical method for the generation of realistic speech. There mainly two types ...
متن کاملAn artificial intelligence approach to concatenative sound synthesis
iii Content Overview v-vii List of Figures viii-x List of Tables xi-xii List of Abbreviations xiii-xiv Acknowledgments xv-xvi Author’s Declaration xvii CHAPTER 1: INTRODUCTION 1 1.1 Motivation 1 1.2 Introduction 7 1.3 Objectives 14 1.4 Thesis Structure 18 CHAPTER 2: PRINCIPLES OF CONCATENATIVE SOUND SYNTHESIS 20 2.1 Sound Synthesis 20 2.1.1 Rule-based Model 23 2.1.2 Data-driven Model 27 2.2 Sub...
متن کاملAutomatic Speaker Recognition using LPCC and MFCC
A person's voice contains various parameters that convey information such as emotion, gender, attitude, health and identity. This report talks about speaker recognition which deals with the subject of identifying a person based on their unique voiceprint present in their speech data. Pre-processing of the speech signal is performed before voice feature extraction. This process ensures the voice...
متن کاملFeature extraction by auditory modeling for unit selection in concatenative speech synthesis
A comprehensive computational model of the human auditory peripherals was applied to extract basic features of speech sounds. The auditory model extracts features by the auditory temporal coding mechanism in addition to features by the auditory place coding mechanism which has traditionally been used as spectral features. It also considers the nonlinearity of human auditory responses. Several s...
متن کامل